Skip to content

feat: harness effectiveness benchmark suite (#49)#52

Merged
rlaope merged 2 commits intomainfrom
feat/issue-49
Apr 9, 2026
Merged

feat: harness effectiveness benchmark suite (#49)#52
rlaope merged 2 commits intomainfrom
feat/issue-49

Conversation

@rlaope
Copy link
Copy Markdown
Owner

@rlaope rlaope commented Apr 9, 2026

Closes #49

Summary

  • 10개 시나리오 벤치마크 (hallucination, platform, security, deprecated)
  • Harness ON: 100% catch rate (9/9), 0 false positives
  • Harness OFF: 0% catch rate (0/9)
  • npm run benchmark로 실행, 결과 자동 저장 benchmarks/results/

Test plan

  • 11 benchmark tests all pass
  • 기존 682 tests 전부 통과
  • npm run benchmark 정상 작동

rlaope added 2 commits April 9, 2026 11:04
Benchmark framework that measures bestwork harness gates against
known-bad code scenarios. 10 scenarios across 4 categories
(hallucination, platform, security, deprecated). Harness ON catches
100% (9/9), vanilla catches 0%. Zero false positives.

- benchmarks/harness-benchmark.test.ts with 10 scenarios
- npm run benchmark command
- Auto-saves JSON results to benchmarks/results/

Signed-off-by: rlaope <rlaope@users.noreply.github.com>
…cenarios

- Add benchmarks/results/ to .gitignore, remove committed result file
- Remove unused imports (execSync, rmSync, readFileSync, tmpdir)
- Fix byCategory report: separate catchable vs negative denominator
- Add relative import detection to simulateReviewHook
- Add 3 new scenarios: missing relative import, 2 false-positive guards
- 13 scenarios, 100% accuracy, 0 false positives

Signed-off-by: rlaope <rlaope@users.noreply.github.com>
@rlaope rlaope merged commit 7e4c02d into main Apr 9, 2026
1 of 2 checks passed
@rlaope rlaope deleted the feat/issue-49 branch April 9, 2026 02:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat: bestwork ON vs OFF benchmark suite

1 participant